DEVIATION INEQUALITIES FOR SUMS OF WEAKLY DEPENDENT TIME 

SERIES 



OLIVIER WINTENBERGER 



Abstract. In this paper we give new deviation inequalities of Bernstein's type for the partial 
sums of weakly dependent time series. The loss from the independent case is studied carefully. 
We give non mixing examples such that dynamical systems and Bernoulli shifts for whom our 
deviation inequalities hold. The proofs are based on the blocks technique and different coupling 
arguments. 
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1. Introduction 

The aim of this paper is to extend the deviation inequality of Bernstein's type from the inde- 
pendent case to some weakly dependent ones. We consider a sample (Xi, . . . ,X n ) of a stationary 
process (X t ) in a metric space (X,d). Considering the set T of 1-Lipschitz functions from X to 
[—1/2, 1/2], we are interested by the deviation of the partial sum S(f) = Y17=i fi-^i) for any / G T 
assuming that E(/(Xi)) = 0. If the X { are independent and if a 2 k {f) = k~ l Var(JXi f(Xi)), the 
classical Bernstein inequality gives the deviation estimate, see Bennett [3]: 



(1.1) > y/2nal(f)x + x/g) < e~ x for all x > 0. 

This inequality reflects the gaussian approximation of the tail of S(f) for small values x. And 
for large values of x, it reflects the exponential approximation of the tail of S(f). This deviation 
inequality is very useful in statistics, see for example the monographs of Catoni [5] and of Massart 

m 



To extend such deviation inequality to the dependent tradeoff between the sharpness 

of the estimates and the generality of the context has to be done. Estimates as sharp as in 
the independent cases (up to constants) are obtained for Markov chains in Lezaud [17], Joulin 
and Ollivier [16] under granularity. Bertail and Clemencon [4] obtain a deviation inequality for 
recurrent Markov chains. There exists C > such that for all M > and all x > 0: 

V(S(f) > C{^na 2 {f) T x + Mx)) < e~ x + nP(Ti > M), 

where the Tj are the iid regeneration times and a^if) = E(T) _1 Var(^^ :1 f{Xi)). Up to a 
constant, it is the limit variance in the CLT of S(f), more natural than crf(/) in (jl.lj) . This 
primitive estimate of the tail is natural as, through the splitting technique of Nummelin [21], the 
partial sums S(f) are sums of iid sequences of blocks of size Tj. If the regeneration times are 
bounded, then up to different constants the same estimate than in the iid case is obtained. If 
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the regeneration times admit finite exponential moments, fixing M Inn Adamczak [T] obtains 
estimates of the deviations with a constant C > 0: 

V(s(f) > c(Jno%(f)x + In rax)) < e~ x for all x > 0. 

A loss of rate Inn, that cannot be reduced, appears in the exponential approximation compared 
with the iid case, see Adamczak p] for more details. 

In all these works, the strong Markov property is crucial. To bypass the Markov assumption, 
one way is to use dependent coefficients. Ibragimov [14] introduced the uniformly (^-mixing coeffi- 
cients. In this settings, Samson [24] achieves the deviation inequality (jl.lj) with different constants 
as soon as Yl V4> r < 00 • Less accurate results have been obtained for more general mixing co- 
efficients than the (^-mixing ones: Viennet [25J for absolutely regular mixing and Merlevede et 
al. [19] for geometrically strongly mixing. Recently, mixing coefficients have been extended to 
weakly dependent ones, see Doukhan and Louhichi [TT] and Dedecker and Prieur [8]. Under the 
exponential decrease of these coefficients, deviation inequalities for S(f) with a loss in the expo- 
nential approximation are given in Doukhan and Neumann [12] , Merlevede et al. [20] extends 
these results for the partial sum S(f) for unbounded functions /. 

The dependence context of this paper is the one of the so-called (^-weakly dependent coefficients 
introduced by Rio in [22] to extend the uniformly ^-mixing coefficients. We provide new deviation 
inequalities for non mixing processes, such that dynamical systems called expanding maps, see 
Collet et al. [6j and continuous functions of Bernoulli shifts. The Bernstein's deviation inequality 
in these non mixing contexts sharpens the existing ones. The deviation inequality is obtained by 
dividing the sample (Ai, . . . , X n ) in different blocks (Ai, . . . , Aj+fc*), where the length k* must be 
carefully chosen and then by approximating non consecutive blocks by independent blocks using 
a coupling scheme. 

The coupling scheme follows from a conditional Kantorovitch- Rubinstein duality due to Dedecker 
et al. [9 J and detailed in Section [2l Using this coupling argument, a new deviation inequality can 
be stated in Section [3j 

(1.2) P(tf(/) > 5.8^na 2 k ,(f)x + 1.5 Fx))) < e~ x for all x > 0, 

with Oj(f) = supj<j.< n for all 1 < j < n and k* = min{/c > 1; k5k < &1(f)}, where (5k) 
only depends of the (^-coefficients, see condition (|3.ip for more details. Unlike 0i(/) in (jl.ljl . the 
variance term cf|*(/) is natural as it tends to the limit variance in the CLT with k*. When the 
TLC holds, i.e. o\(f) — > c 2 (/) > 0, then the classical Bernstein's inequality fll.lj) holds up to 
constants with af(f) replaced by cr 2 (f), see Subsection 13.31 for more details. On the opposite, 
if the functionals f n are such that cr 2 (f n ) — > 0, then for exponentially decreasing rate of the ip- 
coefficients, k* — ln(cff(/ n )) and a logarithmic loss appears. As in the recursive Markov chains 
case, the loss in the exponential approximation depends on the size of blocks k*. We do not know 
if this loss in the exponential domain may be reduced for such non uniformly ^-mixing sequences. 

In many practical examples such that chains with infinite memory introduced by Doukhan and 
Wintenberger [13j, Bernoulli shifts and Markov kernels, an L°° coupling scheme is tractable, see 
Section [5] for a detailed definition. In these specific cases of (p-we&kly dependent sequences, an 
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improved version of the deviation inequality (|1.2I) is given in Theorem 15.11 

P (S(f) > 2^na 2 k , f (f)x + 1.34 k*'xfj < e~ x for all x > 0, 

with k* = min{l < k < n / n5' k < kx}, where (S' k ) only depends of the L°° coupling scheme, see 
condition (|5.1I) for more details. The paper finishes with the proofs collected in Section [6j 

2. Preliminaries: coupling and weak dependence coefficients 

Let (Xi, . . . ,X n ) with n > 1 be a sample of random variables on some probability space 
(Q,A,F) with value in a metric space (X,d). We assume in all the sequel that for any n > 1 

there exists a strictly stationary process (x\ ) such that {X\, . . . ,X n ) = (x[ n \ . . . ,X„). Let 
us consider T the set of measurable functions / : X i— >• M satisfying: 

(2.1) \f(x) - f(y)\ < d(x, y), V(x, y) G X x X and sup |/(x)| < 1/2. 

We denote the partial sum S(f) = ^27=1 fi^i) an d -Mj = o{X t ; 1 <t < j) for all 1 < j < n. 

2.1. Kantorovitch-Rubinstein duality. The technique of coupling is related with the Kantorovitch- 
Rubinstein duality. The duality states that given two distribution P and Q on X there exists a 
random couple Y = (Yi, Y2) with Y\ ~ P and Y2 ~ Q satisfying 

E(d(ri,y 2 )) = sup E|/(dP - dQ)| = infE(d(y/,y 2 ')), 
/6A1 y ' 

where F' have the same margins than Y and Ai denotes the set of 1-Lipschitz functions such that 
\f(x)-f(y)\<d(x,y). 

Dedecker, Prieur and Raynaud de Fitte [9] extend the classicalKantorovitch-Rubinstein duality 
in the time series framework by considering it conditionally on some event Ai G A. Assuming 
that the original space is rich enough, i.e. it exists a random variable U uniformly distributed 
over [0, 1] and independent of Ai, for any Y\ ~ P with values in a Polish space it exists a random 
variable Yi ~ P independent of Ai satisfying 

(2.2) m{Yi,Y 2 ) I Ai) = sup{|E(/(Yi)|M) - E(/(Yi))|, / e Ai} a.s. 

2.2. (/7-weak dependence coefficients and coupling schemes. Let us recall the weak depen- 
dence coefficient <p introduced in Rio [22] 

Definition 2.1. For any X £ X, for any a-algebra Ai of A then 

<p{M,X) = sup{\\E(f(X)\Ai) — E(/(X))||oo, / G ^}. 
Another equivalent definition is 

(2.3) ^(.M.Xr)) = sup{| Cov(Y,f(X r ))\,f G and Y is .M-measurable and E|Y| = 1}, 
see [8]. 

We will denote by (A) the specific case where X is a Polish space with sup^ x y ^ eX 2 d(x,y) < 1. 
In the case (A) we have <p(Ai,X) = Too(Ai,X) where Too is the coupling coefficient defined in [7] 
by the relation 

T 00 (M,X)=su P {\\E(f(X)\M)-E(f(X))\\ 00 ,feA 1 }. 
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This last coefficients is the essential supremum of the right hand side term of the conditional 
Kantorovicth- Rubinstein duality (12,21) . Thus in the case (A) we get a coupling scheme directly 
on the variable X via the Kantorovitch- Rubinstein duality (|2.2p : it exists a version X* ~ X 
independent of M. such that 

\\E(d(X,X*) | M)\\oo = roo(A^,X) = <p(M,X). 

When d is the Hamming distance d(x,y) = l^^y the coefficient (p(A4,X) coincides with the 
uniform mixing coefficient <p(M,a(X)) of Ibragimov defined for 2 cr-algebra M. and M. 1 as: 

<t>(M,M') = sup |P(M' | M) — P(M')|. 
MeM,M'eM' 

In more general context than (A), we have (p(A4, X) < Too(M,X) and coupling scheme directly 
on X is not tractable. Then we do coupling scheme on the variables f(Xi) for some function /: If 
the sample (Xi, . . . X n ) is such that the coefficients ip(A4j,Xi) are finite for 1 < j < i < n and if 
/ 6 T then the coupling scheme for f(Xi) follows from the conditional Kantorovitch- Rubinstein 
duality (12. 2p and the relation 

T 00 {M j ,f{X i ))<i P {M j ,X i ) : 

There exists f{X{)* such that /(JQ)* ~ f(Xi) is independent of M.j and 

\\E(\f(Xi)* - f(Xi)\ | M j )\\ 00 =T 00 (M j ,f(X i ))<ip(M j ,X i ). 

In the case (A) we also have another possible coupling scheme for f(X{), see Section[5]for practical 
examples: f(X*) ~ f(Xi) is independent of M.j and 

||E(|/(X*) - f(Xi)\ | M^Woo < \\K(d(Xt,Xi)\ | MJWoo = r^MjJiXi)) = tpiM^Xi). 

2.3. Extensions on the product space X q , q > 1. To consider conditional coupling schemes of 
length q > 1 we need to extend the notions of weak dependence coefficients on X = (X t ) r <t<r+q £ 
X q . It depends on the metric d q chosen for X q : 

Definition 2.2. For any q > 1, any X G X q and any cr-algebra M. of A let us define the 
coefficients 

<p(M, X) = sup{||E(/(Z)|A4) - E(/P0)[|oo, / e F q }, 

where T q is the set of 1-Lipschitz functions with values in [—1/2, 1/2] of X d equipped with the 
metric d q (x,y) = q' 1 J2i=i d (xi, Hi). 

Let us discuss the consequences of the choice of the metric d q : 

• The Too coupling coefficients on X q are defined for the metric d q , see jjj, and for all / G 

Too (M, (f(Xi), f(X q ))) < ip(M,(Xx,..., X q )). 

Moreover, in the case (A) it holds t^M.X) = (p(M,X). 

• If d is the Hamming metric, as d q (x,y) < TL x ^y then (p(A4,X) < 4>(M,<j(X)). Thus the 
definition of the weakly dependent coefficients ip differs here from the one of Rio in |22j 
where X q is equipped with doo(x,y) = maxi<j< 9 d(xi,yi). 
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2.4. First application: deviation inequality of Hoeffding type. This application is due to 
Dedecker and Prieur [8]. Assume that X is a Polish space such that sup xy d(x,y) < 1, i.e. we are 
in the case (A). Assume that the coefficients tp(Mj, (Xj+i, . . . , X n )) are finite for all 1 < j < n — 1 
and that g : X n — ► R satisfies 



\g(x u ...,x n )- g(yi, ...,y n )\ <^d( 
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As we are in the case (A) it exists a coupling scheme (X* +1 , . . . , X*) of (Xj+i, . . . , X n ) for any 
1 < j < n — 1 such that, keeping the same notation than in |23j : 

T(g) = \\E(g(X j+1 ,...,X n ) \ Mj) - E(g(X j+1 , . . . , XJU 

= \\E(g(X j+1 , ...,X n )- g(X* +l , . . . , X*) \ Mj)^ < {n-j)<p(M j} (X j+1 , . . .,X n )). 

Applying Theorem 1 of [23], if E(g(X 1 ,. . . , X n )) = then for all x > it holds: 



f(X\, . . . , X n ) > 



2-1 ^ (1 + 2(n - j)<p(Mj, (X j+ i, X n ))) 2 x ] < e~ x . 

3=1 



This deviation inequality of Hoeffding type only differs from the one for independence by a con- 
stant. However, such inequalities are not as satisfactory as Bernstein ones for statistical applica- 
tions. 



3. Deviation inequality around the mean inequality 

Let us give an inequality for the deviation around the mean of S(f) = Ya=1 fi-^i) for / G J 7 , 
with (Xi, . . . , X n ) on the metric space (X, d) and such that there exists a non increasing sequence 
(5 r ) that satisfies 

(3.1) sup ip(M.j, , {X r+ j, . . . , X2r+j~\)) < <> r for all r > 1. 

l<i<n-2r+l 

3.1. A deviation inequality of Bernstein type. Assume with no loss of generality that 
E(/(Xi)) = 0. 

Theorem 3.1. For any integer n, if there exists (6 r ) as in (|3.ip then 

P (s(f) > 5.8^ na 2 k 4f)x + 1.5 k*x^j < e~ x , 

where k* = min{l < k < n / k5 k < o\{f)} anda\*{f) = max{a|(/) / k* < k < n}. 

The proof of this Theorem is given in Subsection 16.11 We adopt the convention min = +oo 
and the estimate is non trivial when r5 r — ► and n5 n > W n (f), i.e. for not too small values of n. 

Remark that the variance term a\*{f) is more natural than a\ (/) in (jl.ip as in the central limit 
theorem a 2 ,* (/) converges to the limit variance as k* goes to infinity. Before giving some remarks on 

this Theorem, the next proposition give estimates of the quantity cr^(/) = k~ l Var ( Yli=i f(Xi) ) ■ 
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3.2. The variance terms <r|(/). Under suitable assumptions on (S r ), it is always possible to 
obtain rough estimates of cr%(f) in function of crf(/) and E|/(Xi)|: 

Proposition 3.2. // the condtion (13.1(1 is satisfied then we have for all k < n the inequality: 

4(f) < (^(f) + m\f(x 1 )\ y psX 

See Subsection for a straightforward proof 16.21 of this Proposition. The estimate given in 
Proposition 13.21 can be rough, for example in the degenerate cases when cr^(/) tends to with k. 

3.3. Remarks on Theorem 13.11 The gaussian behavior around the mean is, up to a universal 
constant, the same than in the iid case with the more natural variance o\*{f) instead of o~\{f) in 
(jl.ljl . However in the exponential domain the estimates given in Theorem 13.11 is sometimes less 
sharp than the one obtained for ^-mixing in Samson [24J. 

In the non degenerate case o"^(/) — > & 2 (f) > then k* is finite as soon as rS r j 0. The deviation 
inequality of Theorem 13.11 becomes similar than the one in the iid case (II. 1|) with the variance 
term <J 2 (/) instead of a\ (/): there exists C > such that for n sufficiently large we have 

P(5(/) > C(y/na 2 (f)x + a?)) < e~ x for all x > 0. 

However, the estimate of the exponential behavior in Theorem 13.11 may differ from the one of 
the iid case. For example, for statistical issues it is often assumed that / is chosen depending 
on n such that <J 2 (f n ) — ► 0. Assume that r5 r is summable. Using Proposition 13.21 and Jensen's 
inequality we have the estimate cff(/ n ) < o-\(f n ) 1 / 2 . If 0f(/ n ) _1 / 2 n <5 n { then for n sufficiently 
large such that &* = min{A; < n / k5k < crl(fn) 1 / 2 } exists, it holds 

> C(^Ja 2 K {f n )nx + k* n x)) < e~ x for all x > 0, with C > 0. 

As fc* | oo there is a loss compare with the iid case (jl.ip . We do not know if this loss may be 
reduced outside the cases of uniformly mixing processes where (|1.1|) holds, see Samson |24j . 

This loss may be reduced when the autocorrelations are controlled, choosing a smaller size of 
blocks fc*. Assume that a\{f n ) < cr 2 (f n ) (such relation is satisfied in the uniformly t/>-mixing 
context). If o- 2 (f n )~ 1 n5 n [ then for n sufficiently large such that fc* = min{/c < n / k6k < 
°"i(/n)} exists, it holds 

P (s(f n ) > C (Ja^Jn)nx + k*xY\ < e~ x for all x > 0, with C > 0. 

The loss compare with the iid case is due to fe* | oo. More precisely 

• If S r = C5 r for C > and < <5 < 1 then k*^- ln(af (/„)), 

• If <5 r = Cr 5 for C > and 5 > 1 then fc* « ^ (/ n ) 1/(1 ^ } . 

4. Examples 

We focus on non ^-mixing examples as for them the inequality (|1.1|) holds up to constants, see 
Samson [21]. We present dynamical systems that are known to be non ^-mixing processes but 
they satisfy (|3.1|) in the case (A). Other examples in the case (A) are presented in the Section 
[U as a sharpened deviation inequality holds for them, see Theorem 15.11 We also present in this 
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Section continuous functions of Bernoulli shifts that are examples not </>-mixing and not in the 
case (A) and thus cannot be treated by the approach of Section [5] and of |24|. 

4.1. Dynamical systems. Here we are in the case X = [0; 1] and d(x,y) = \x — y\, i.e. in the 
(A) and then (p = t^. Since Andrews [2], dynamical systems, defined as stationary solutions of 
X t = T(X t+ i) for all t are classical examples of non-mixing processes. Let us consider X t the 
stationary solution of 

X t = l -{X t ^+i t ) 

where (£t) is an iid sequence distributed as a Bernoulli(l/2). Then X t = T{X t +i where T(x) = 2x 
modulo 1. Even if it is not mixing, easy computation shows that (X t ) satisfies (13. ip with 
r5 r = (4/9)2~ r (in fact this specific case satisfies also r5' r = (4/9)2~ r , see Section [5] for more 
details). 

More general examples of dynamical systems are studied in Collet et al. [6]. They obtain esti- 
mates of covariances terms, multivariate versions of (|2.3p . for dynamical systems called expanding 
maps. Then it follows the existence of C > and < p < 1 such that (13.11) is satisfied with 
r5 r = Cp r , see Dedecker and Prieur [8] for more details. 

4.2. Continuous functions of Bernoulli shifts. Let us consider a ^-mixing stationary process 
in some measurable space y and a sequence (Ut) in the metric space X defined as 

U t = F(^ f , j G N), 

where F is a measurable function. Assume that the original state space is large enough such that 
it exists distributed as but independent of it. As in [22], assume that there exists a non 
increasing sequence (vf~) satisfying almost surely 

d(F^ r ,jeN),F^;jeN))<v k , 

with the sequence (tf) satisfying £t = £^ for < t < k and for t > k, ^ = £' t . Finally set 
X t = H{Ut) for some measurable function H : X — > X and t = {1, . . . , n} and denote 

w H (x,rj)= sup d{H(x),H{y)). 

d(x,y)<ri 

Proposition 4.1. The sample (Xi, . . . ,X n ) satisfies (|3.1h with 

5 r = inf {2<p r ^ k +E(3w H (U ,2v k ))M}. 

l<k<i — 1 

See the Subsection 16.31 for the proof of this Proposition. Remark that by construction the 
process (X t ) is non necessarily in the case (A) 

5. In the case (A) with a coupling scheme in L°°. 

In all this section we place us in the case (A) where X is a Polish metric space and d(x, y) < 1 
for all x,y £ X. For all r > 1 a coupling scheme in L°° for (Xi) r+ j<i < 2 r +j-i, j > 1, exists when 
we can construct (X?) r+ j<j < 2 r +j-i distributed as (Ai)r+j<i<2r+j-i and independent of Aij such 
that 

2r+j-l 

(5.1) sup V d(X h X*) < r5' r a.s. for all r > 1. 

l<7<n-2r+l . , . 
— — i=r-\-j 



8 



O. WINTENBERGER 



5.1. A sharper deviation inequality of Bernstein's type. Remark that condition (|5.ip with 
(S' r ) implies condition (|3.ip with 6 r = S' r . Then we obtain a slightly sharper deviation inequality 
than in Theorem 13.11 

Theorem 5.1. For f £ T such that E(/(Xi)) = then we have for all x > n5 k and all 1 < k < n: 



with k*' = min{l < k < n / n5' k < kx] . 

The proof of this Theorem is given in Subsection 16. 41 

Let us compare this deviation inequality with the result of Theorem 13 . 1L In Theorem 15.11 
the variance term a k (f) sharpens o^(f) and the normal approximation is better here. For the 
exponential approximation, in both Theorems losses are due to the chosen blocks sizes. As 
k*' = min{l < k < n / kb' k < xk 2 /n}, if k5' k is decreasing as k5 k then k*' < k* as soon as 



ncr k(f) — or equivalently yriff^(/)i < kx, i.e. as soon as x is in the domain of the expo- 
nential approximation. Thus for the normal and the exponential approximations, he deviation 
inequality in Theorem 15.11 improves the one of Theorem 13.11 

A tradeoff between the generality of the context and the sharpness of the deviation inequalities 
is done. Even if (|5.1I) is less general than (|3.1I) . it is satisfied for many examples, see below. 

5.2. Bounded Markov Chains. Following Dedecker and Prieur [8], let us consider a stationary 
Markov chain (X t ) with transition kernel P satisfying, for all / S Ai, that P(f) = J f(y)P(x,dy) 
is a K-Lipschitz function with k < 1. Then 



see [8] for more details. 

5.3. Bounded chains with infinite memory. Let the sequence of the innovations (£t)tez be 
an iid process on a measurable space y. We define X = (X t )tez as the solution of the equation 




where h{x) = (1 + x) ln(l + x) — x for all x > 0. Then it holds for all x > 0: 





r5' r = K r {l + --- + K T ), 




oo 



(5.3) d(F((x k ) keN \ {0 y, £,o), F((y k ) keN \{o}; £o)) < ^aj{F)d{x j ,y j ), a.s. 

3=1 



for all (xk)k£N\{o}i {Uk)keN\{o} £ X N \^ such that there exists N > as x k = y k = for all 
k > N and with a,j(F) > satisfying 



oo 



(5.4) 
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Let {it)t&z be a stationary sequence distributed as (£t)teZ, independent of ((,t)t<o and such that 
£t = Q for t > 0. Let (A 4 *) ie z be the solution of the equation 

x; = F(jq:_ 1 ,x;_ 2 ,. ..;$), a.s. 

Using similar arguments than in Doukhan and Wintenberger [13J we have the following result, 

Lemma 5.2. Under condition (|5.4|) there exists some bounded (by 1/2) stationary process X 
solution of the equation (|5.2p . Moreover, this solution satisfies (|5.ip with 

2r-l ( oo \ 

^ = Eo^- Upy/v + Y, aj (F) . 

j=r { j=p J 

As the proof of this Lemma is similar than the one in [13], it is omitted here. 

Many solutions of econometrical models may be written as chains with infinite memory. How- 
ever, the assumption of boundedness is very restrictive for practical models. 

5.4. Bernoulli shifts. Solutions of the recurrence equation (|5.2p may always be written as X t = 
H((£j)j<t) for some measurable function H : y m i— ► X were is an iid process called the 
innovations. In this very general framework, a coupling version X% is given by XI = H 
where is a stationary sequence distributed as (&), independent of (£t)t<a and such that 
Ct = Ct for t > 0. If there exist a% > such that 

,Ui) with y^q^ < oo, 

i>l i>\ 

and if y is a metric space such that it exists y G ^ with d(£i,y) bounded a.s., then (Xt) satisfies 
(pTI}) with 

for some C > 0. 

6. Proofs 

This Section contains the proofs. 

6.1. Proofs of the Theorems 13.11 This section contains the proofs of the Bernstein's type 
estimates on the partial sums S(f) for / £ T . As in the independent case, the proofs follow the 
Chernoff device. We will proceed using Bernstein's block technique as in [10]. Let us denote by 
Ij the j-th block of indices of size k, i.e. {(j — l)k + 1, jk} except the last blocks and let p be an 
integer such that 2p — 1 < k~ x n < 2p. 

Let us denote by Si and S2 the sums of even and odd blocks defined as 

si= Yl and s *= E 

Prom Cauchy-Schwartz inequality, it holds: 

lnE[exp(i£(/))] < - (InEexp (2tS x ) + InEexp (2tS 2 )) . 
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Now let us treat in detail the term depending on Si, the same argument applies identically to 52. 
We want to prove that for any < t < 1, choosing k = [1/t] A n as in [10J it holds: 

(6.1) lnE(exp(tS(/))) < Ant 2 {2{e - 2)a 2 k {f) + ek5 k ). 

Denoting L m = lnE(exp(2i X^e/ 2 Kj<m/(-^»))) ^ or an y ^ — m — P-> we ^° a recurrence on m 
remarking that lnE(exp(2iSi)) = L p . Prom Holder inequality, we have for any 2 < m < p — 1 the 
inequalities: 

exp(L m+ i) - exp(L m ) exp(Li) 



< exp(L r , 

< exp(L r 



E(exp(2t f( X i)) I M 2m k) -E^exp (2t ]T f(X t 



*G/ 2 (m+l) 



E(exp(2t ^ 



exp 



2« X) 



2mfc 



where exp J^ g j 2( +i) f(Xi)\ is a coupling version of the variable exp [2t^2iei 2 ( +i) /pQ' 
independent of M.2mk- Prom the definition of the coupling coefficients Too, we know that 



E(exp(2t Y, /(*«))- exp (iJi £ f(X t )j \M 



2 mA- 



< Too (.Mam*, exp (2t X] 

*G/ 2 ( m+ i) 

As Yliei 2 d f(Xi) is bounded with k/2, then it — > exp(2tn) is a Lipschitz function with constant 
2/ctexp(fci) with respect to d k and bounded with exp(fci) for all t > 0. We then deduce that for 
n _1 < i < 1, choosing = [l/t] A (n — 1) and under condition (|3.ip we have 

Too(-M 2m fc,exp (M £ < 2kte kt v(M 2mk , (*i)ieJ a(TO+1) ) < 2e5 k . 



2(m+l) 



Collecting this inequalities, we achieve that 

exp(L m+ i) < exp(L m )(exp(Li) + 2e5 k ). 

The classical Bennett's inequality on YLieh fi-^i) gives the estimates exp(Li) < l+4<7 2 .(/)/fc(e fci - 
kt — 1) and as kt < 1 we obtain 



L 



m+l 



< L m + In 1 + 



1 | 4(e-2)^(/) + 2eM^ 



4(e-2)a 2 k (f) + 2ek5 k 



k 



The p steps of the recurrence leads to the desired inequality 

lnE(exp(2i5i)) < 2p 



2(e-2)a 2 k (f) + ek5 k 



k 



As the same inequality holds for 52 we obtain (16. ip for n 1 < t < 1 remarking that 2pk 1 < 4ni 2 . 
For i < n , classical Bennett inequality on Si gives 

lnE(exp(2t5i)) < Aa 2 n {f)/n(e nt - nt - 1). 
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Remarking that e nt - nt - 1 < (nt) 2 Y^ k >o( nt ) k /( k + 2 ) ! and ( k + 2 ) ! ^ 23 ^ we derive that 
e nt - nt - 1 < 2- l (nt) 2 J2k>o 3 ~ k ^ 3/4(nt) 2 for nt < 1. Then collecting thes bounds, for t < rT x 
it holds 

lnE(exp(2tSi)) < 3na 2 n (f)t 2 < 4nt 2 (2(e - 2)a 2 n {f) + en5 n ). 
The same holds for £2 and then (16. 1|) follows for < i < n _1 and then for all < i < 1. 

Note that for k > k* we have cr|(/) < of,(/) and < of (/) by definition. Prom (16. ip we 
achieve 

lnE(exp(tS(/))) < Kna 2 k *(f)t 2 , for < t < A;* -1 , 

with K = 4(3e-4). Follow the Chernoff's device, i.e. using mP(S(/) > x) < lnE(exp(iS(/)))-ta 
and optimizing in < t < k*" 1 , we obtain 

/ x 2 \ / Kria 2 (/) x \ 
P(S(f) >x)< exp y- 2Kn - 2 ( f) J 1 fc*,<2^(/ ) + exp ^ ^ —J l fc * x>2Xn ^ (/) . 

Easy calculation yields for all x > 

P(5(/) > ^2KnWl(f)l k ^ 2KnwlM) + (Ft + r-^n^a))!^^^^^ < e-x. 

A rough bound + k*~ l Knol*(f) < 3k*x/2 for /c* 2 x > 2Kno\,{f) leads to the result of the 
Theorem. 

6.2. Proof of Proposition 13.21 We have the classical decomposition 

/ k \ fc-1 

Var 2 fiX,) = k Var(/(X0) + 2 £(fc - r) Cov(/(*i), f(X r+1 )). 

\i=l J r=l 

Now let us consider the coupling scheme f(X r+ i)* distributed as f(X r+ i) but independent of 
Mi. Then from Holder inequality it holds 

Cov(/(Xi), f(X r+ i)) = E(M(f(X r+ i) - f{X r+ i)* I Mi)f(Xi)). 

But as f(X r+ i) — f(X r+ i)* < 5 r conditionally to Mo we get the desired result. 

6.3. Proof of Proposition 14. ll We adapt the proof of [22]. We are interested in estimated the 
coefficients (p(Mj, (X r+ j, . . . , X2 r -i+j)) for any (j, r) satisfying 1 < j < j + r < 2r — 1 + j < n. 
Let us fix (j, r) and denote (£f ) a sequence such that £ f fc = £t for all t > r + j — A; > j and £f = ^ 
otherwise. Denote = •; j G N) and A 7 "^ = H{U%). For any / G f , we have 

/ 2r-l+j \ 

(6.2) /(X r+J -, . . . , X 2r _ 1+i ) - /(A r fc +J , . . . , At._ 1+i ) < - E A L 

\ i=r+i / 

By definition of the modulus of continuity and as d(Uj t , Ui) < v k for any r + j < i < 2r — 1 + j, 
we have 

d(X it Xf) = d(H(Ui),H(U*)) < w H (U t k ,v k ). 
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Remarking that (r 1 J2fL r +j j w n{Uf, v k )^j Al is a measurable function of (((,' t ) t<r +j-k, (€t)t>r+j-k) 
bounded by 1, it holds from the definition of the ^-mixing coefficients: 

/ 2r-l+j \ / 2r-l+j \ 

E (r" 1 E vi B {Utv k )) M / Mj <0 r _ fc + E Mr" 1 E w H {U* ,v k )) M\ . 

\ i=r+j J \ i=r+j J 

Using again that d(U^,Ui) < v k , then wuiUf ,v k ) < 2wH(Ui,2v k ). By stationarity of (Ut), we 
obtain 

/ 2r-l+j \ 

E (f 1 E w H {Utvk)) Al J <E(2u; H (Ub,2« fc )) A 1. 

So combining these inequalities we obtain for all 1 < k < r — 1: 
(6.3) 



E (^f(X r+ j, . . . , X 2r -l+j) — f(Xr + j, . . . , X. 



k 

2r-l+j, 



M. 



< 



'r—k 



+ E(2t0 H (E/b,2u fc ))Al. 



Using again the definition of the c/>-mixing coefficients, as / is bounded by 1 it holds 



(6.4) 



E f(Xf +j , X« r _ 1+j ) I Mj ) - E . . . , X, 



'■k 

L 2r-l+j. 



< 



'r—k • 



Finally, using again (|6.2p and that d{X^Xf) < WniU^v^), by stationarity of (U t ) we obtain 

(6.5) Ef{X r+j , X 2r _ 1+i ) - E/(X r fc + ,, . . . , Xi_ 1+j ) < E(w H (U ,v k )) A 1. 

The result of the Proposition 14.11 follow from the definition of the ^-coefficients, the inequalities 
(JO), El and D. 



6.4. Proof of Theorem 15.11 Let us keep the same notation than in the proof of Theorem 13.11 
The Benett's type deviation inequality follows classically from the Chernoff device applies with 
the estimate: 

2na 2 ( f) 

(6.6) ln(E(exp(t5(/))) < — -|^(exp(fct) - kt - 1) + n5' k t for all t > 0. 

To prove (|6.6|) . let us use the L°°-coupling scheme and (15. ip to derive for all 1 < m < p: 



E / w - E /w; 



*€/ 2 (m + l) 

where, as in Subsection 16.11 |Jj| = A; for all 1 < j < 2p with 2p — 1 < nk^ 1 < 2p. Then, for all 
i > we have: 



exp 2t £ /(^) ^ e2 * fe ^ ex P 2t E a - s - 
\ «eJ"2m / \ iehm J 

for all 1 < m < p. In particular, by independence of (X*)i € j 2m with A^j-i and by stationary we 
deduce that 



E exp 2t E I -^2(m-i) | < e 2ifc ^E [ exp ( 2tY j f(X*) 
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for all 1 < m < p. Applying this inequality for m = p we have 

Eexp(2t^) = E I exp I 2t £ £ /(X,) J E I exp I ^ f(X t ) J | M 2(p _ 1) 

< e 2tk5 '*E I exp I 2t £ f(X*) J J E I exp I 2t £ £ f(X t 
\ \ ieh / / \ \ i<m< P -iiei 2m 

Let us do the same reasoning recursively on m = p — 1, . . . , 2 to obtain finally 

lnEexp(2tS'i) < 2(p - l)fc^t + plnE ^exp ^2t^/(X*) 

The classical Bennett inequality gives 

InE ^exp Lt g /(X*) j j < ^M(exp(fct) - fet - 1) 

and the inequality (16 .61) follows remarking that Apk^ 1 < 2nk~ 2 and 2(p — l)k < n. 

For the Bernstein's type inequality, we use (16. 6|) . the series expansion of the function exp(x) — 
x - 1 and that k\ > 23 k ~ 2 for k > 2 to derive: 

na 2 ( f)t 2 

ln(E(exp(tS(/))) < 1 _* fc/3)t + for all t > 0. 

With the same notation than in |18j . for x > n<5jL the Chernoff device leads to: 

F(S(f) > x) < exp (^sr*! v 2na 2 (/) 
where h\[x) = I + x — \/l + 2x for all x > 0. Then for all x > we have 

and the desired result follows as (x) = \/2x + x for all x > 0. 

Acknowledgments. The author is grateful to Jerome Dedecker for his helpful comments. 
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